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Abstract 

In what follows we study non asymptotic behavior of different well 
known estimators AIC(gfi\), BIC([M\) and ED C( [271 EE]) in con- 
trast with the Markov chain order estimator, named as Global De- 
pency Level- GDL(]§\). 

The estimator GDL, is based on a different principle which makes 
it behave in a quite different form. It is strongly consistent and more 
efficient than ^4/C(inconsistent), outperforming the well established 
and consistent BIC and EDO, mainly on relatively small samples. 

The estimators mentioned above mainly consist in the evaluation 
of the Markov chain's sample by different multivariate deterministic 
functions. The log likelihood approach, as in (fTTj) . 



L [(n , of) (i,k)] ( x «f (h k)j = Ci \C 2 - X a « (i, k) log X a « (i, k)) (I) 

with deterministic function 
L [(n,aZ)] = £i{£2 ~ n(i)x(i,j) logx(i,j), L\ = const., C 2 = const. 



or, the GDL approach, as in (fT3|) ). 
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) = G Kn,a<Z)(i,j)]{-~,^al(s,t),...) 
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19 with deterministic function 



[ x (h j) - [ IX i x (h t)] [ E?=i x ( s , j)] ) 



21 shall be analized in Section exhibiting different structural prop- 

22 erties. It will become clear the intimate differences existing between 

23 the variance of both estimators, which induce quite dissimilar perfor- 

24 mance, mainly for samples of moderated sizes. 



2 5 1 Introduction 

26 A Markov Chain is a discrete stochastic process X = {X ra } n>0 with state 

27 space E, cardinality \E\ < oo for which there is a k > 1 such that for 

28 (xi, x n ) £ E n , n > k 



29 P(Xi = xt, .., X n = x n ) = P(Xi = x 1} .., X k = x k )Il™ =k+1 Q(xi\xi- k , Xi-i) 

30 for suitable transition probabilities Q(.\.). The class of processes that holds 

31 the above condition for a given k > 1 will be denoted by A4fc, and A4q will 

32 denote the class of i.i.d. processes. The order of a Markov Chain in 

33 U°l Q A4i is the smallest integer \ k \ such that X = {X n } n>0 £ A4 K - 



Along the last few decades there has been a great number of research on 
the estimation of the order of a Markov Chains, starting with M.S. Bartlett 
[8], P.G. Hoel [IE], I.J. Good [32], T.W. Anderson & L.A. Goodman @], 
P. Billingsley [9], [10] among others, and more recently, H. Tong [26J, G. 
Schwarz [23], R.W. Katz [IJ], I. Csiszar and P. Shields [33], L.C. Zhao et all 



had contributed with new Markov chain order estimators. 
Since 1973, H. Akaike [TJ entropic information criterion, known as AIC, has 
had a fundamental impact in statistical model evaluation problems. The 
AIC has been applied by Tong, for example, to the problem of estimating the 
order of autoregressive processes, autoregressive integrated moving average 
processes, and Markov chains. The Akaike- Tong (AIC) estimator was derived 
as an asymptotic approximate estimate of the Kullback-Leibler information 
discrepancy and provides a useful tool for evaluating models estimated by 
the maximum likelihood method. Later on, Katz derived the asymptotic 
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48 distribution of the estimator and showed its inconsistency, proving that there 

49 is a positive probability of overestimating the true order no matter how large 

50 the sample size. Nevertheless, AIC is the most used and succesfull Markov 

51 chain order estimator used at the present time, mainly because it is more 

52 efficient than BIC for small sample. 



53 2 Essentials on Some Estimators 

54 2.1 Maximum Likelihood Methods 

55 The main consistent estimator alternative, the BIC estimator, does not per- 

56 form too well for relatively small samples, as it was pointed out by Katz [19| 

57 and Csiszar & Shields [13] . It is natural to admit that the expansion of the 

58 Markov Chain complexity (size of the state space and order) has significant 

59 influence on the sample size required for the identification of the unknown 

60 order, even though, most of the time it is difficult to obtain sufficiently large 

61 samples. 

62 Katz(1981) [19J obtained the asymptotic distribution of kaic and proved its 

63 inconsistency showing the existence of a positive probability to overestimate 

64 the order. See also Shibata(1976) [25J. On the contrary Schwarz (1978) [24J 

65 and Zhao (2001) [27] proved strong consistency for the estimators kbic an d 
k>edc, respectively. 

67 Clearly, for a given 77, AIC (if) [26], BIC(rf) [21] and EDC(rf) [27j[TB] contain 

68 much of the information concerning the sample's relative dependency, never- 

69 theless numerical simulations as well as theoretical considerations anticipates 

70 a great deal of variability for small samples. 

71 Let X± = (Xi, X n ) be a sample from a multiple stationary Markov chain 

72 X = {X n } n >i of unknown order k. 

73 Assume that X take values on a finite state space E = {1, 2, m} with 

74 transition probabilities given by 

pOk+iK) = P(X n +i = x n+l \Xl_ K+1 = xl) > (2) 

76 where xl = (xi, ...,x K ) = x{ xf +1 G E K . 
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77 Define 



n-l+l 



N(x[\X n 1 ) = J2 l(X j = x 1 ,...,X j + i . 1 = x l ) 
j'=i 



(3) 



79 i.e. the number of ocurrences of x[ in X*l. If / = we take N(. \X^) = 
so n. The sums are taken over positive terms N(x l i hl \X^) > 0, or else, we 
si convention 0/0 or O.oo as 0. 



83 Definition 2.1. For a[ = (ai,...,a v ) e E v and j e -E 1 , let X a v be the em- 

84 pirical random variables, extracted from the Markov chain sample X™ = 

85 (Xl,...,X n ) 

X a? : X*! — > (X a ,(l),...,X a? (j),...,X a? (m)) 



(4) 



and 



[ a ,(i,i) : (X a?i (l),...,X o? ,0-),...,X a? ,(m)) 



Xii( , / iV(a;»jl^) \ 



m. ♦ 



(5) 



9i Let us define for the order the log likelihood function 



N{a\ 




N{a\j 




N{a\ 


XT) 



log L(v) = Yl 



7V«|X?) ^X a? (j) logX a? (j) 



(6) 
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95 The estimators based on likelihood estimators and penalty functions, for 

96 Markov chains of order | n | are defined, under the following hypothesis: 



There exist a known B so that < k < B 



98 as 

99 
100 
101 

102 where 



kmc = axgmm{AIC(r)) ; r) = 0, 1, B} 
kbic = &rgmm{B I C(r)) ; r) = 0, 1, B} 
kedc = aigmm{EDC(r]) ; rj = 0, 1, B} 



(7) 
(8) 
(9) 



AIC{rj) = -2\ogL{ri) + 2 (|^| — 1), 

BIC(r]) = -2 log L{r]) + \E\ V+1 2 (\E\ - 1) 

EDC(r]) = -2 log £(77) + \E\ V+1 2{\E\ - 1) 
AIC(rj) < EDC(rj) < BIC(rj). 



log(n) 



log log(n) 
2(1^1-1) 



Finally, let us fix a( 



and consider the function 
107 defined as: 

m 

L| ( „,„j )] (...,a : (i,j),...) =JV(a!|A?)(^X ;(i) logX„;(i)- 



E 

1=1 



N(d!{i 




N{a T > 


XI) 



1=1 

m 



logX a? (i,j)).(10) 

j'=i 



no Later on in Section 3.1, we shall analyse the behavior and derivatives of 
in £[( n ,af) (i,fc)] which is just a generic representation of L[(„ j0 «) ( ijfc )]. 

■£[(n,af)(i,fc)] : (0, 1) ->■ IR + 
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£ [(fi,a?)(i,k)](a(»,fc)) = Ci(c 2 - x(i,j) logx(i,j] 



n3 such that 



L[(n,af)(i,fc)](X a «(z,/c)j = £[(n, a «)(i,fc)] ( 



where £1 = N(a1 \ and C 2 = SiLi^afW l°gX a «(«) are assumed con- 
stants with respect to the the variables x(i,j), with x(i,j) = 'K a *,(i,j) as in 



K 



the Markov chain order and 



7T(i) 



^« I -X?) 



, 1 < i,j ' < TH- 



US 2.2 x 2_ divergence estimator 

lie We now briefly recall this new Markov chain order's estimator referring the 

117 reader to ([6]) for related details. 

us Definition 2.2. Let X n = {Xj}" =1 be a sample of a Markov chain X of order 

119 k > 0, ~K a n(i,j) as in K2. r\ > and A 2 QL a v(i, j)) the random variable 

120 defined as follows 



m m 

= N(al I X?) £ E 
t=i i=i 



V 



x a? (<, j) - [ Er=i (i, t)] [ Er=i (s, r N 

(Er=iX a? (^t))(Er =1 x a ,(^j)) 



i=i j=i 



124 Assume that V is a x 2 random variable with (m — l) 2 degrees of freedom 

125 where V is the continuous strictly decreasing function V : M + — > [0, 1] 



P(x) = P(V >x), x E 
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127 The Local Dependency Level LDL n (a r {) and the Global Dependency Level 

128 GDL n {vi) , respectively, are defined as follows 

A 2 (X a v(i,j) 
LDL n (a r l)- 1 ° lV '^ 



2 log(log(n))' 




131 Finally, let us define the Markov chain order estimator based on the infor- 

132 mation contained in the vector GDL n . 

133 Definition 2.3. Given a fixed number < B G N, let us define the set 

134 S = {0, and the application T : S ->■ N 



135 T(s) = -1 Si = l, i = 0,l,..,B 

136 T(s) = max {« : s» = 0, s m = P(£)} , s = (s , s 1 , s B ). ♦ 

0<i<B 



is? Definition 2.4. Lei .X^ = {Xi}f =1 be a sample for the Markov chain X of 

138 order k,0<k<BeN and {GDL n (i)}f =1 as above. We define the order's 

139 estimator kcdlO^i) a $ 

K GDL pei) = T(a n ) + 1 
141 with a n G S so that Vs G S 

B B 

J2{GDL n (i)-a n (i)) 2 <J2(GDL n (i)-s(i)) 2 . ♦ 

i=0 i=0 



143 Observe that the Local Dependency Level LDL n (ai) entirely relies on the 

144 just defined x 2_ square divergence estimator which itself is the summation of 

145 several univariate random variables &[( n ,a1)(i,j)]i 1 < ^, j < m 
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T [(n,al)(i,j)] 



(12) 



EE 

i=i j=i 



X a? (l, j) - [ E£L 1 X a? («, *)] [ E^l X a? («, J 



148 Later on in Section 3.2, we shall analyse the behavior of the deterministic 

149 function G[( n ) «) (jj)] , l<i,j<m and their derivatives 



G[(n,a?)(i,j)] ■ (0, l) 2m ->■ R + , l<i,j <m 



with 



ihj) - [E™i^(M)] [YZLi x ( s >j 



(Er=i^(M))(Er=i 



151 

152 such that 



k(n,af)(i,j)] (•••, X af(M), ...j = G[( ni0 «)(jj)] ^...,x(s,t), ...J 



(13) 



154 with x(s,t) = X af (i,j), l<i,j<m, Xa»(i,j) as in flSJ. 



155 3 Deterministic Accessory Functions 

we 3.1 Functions Related with AIC-Estimator 

157 Let us calculate the derivatives of the deterministic function L/ n ,a?) as in tjTTT) . 

158 which for the sake of notational simplicity and for a fixed n and a*, we'll 

159 temporarily rename the function 

L = L[(n,ai) (i,k)], L : D L C (0, 1) — > R, 
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D L = {x(i,k) : x(i,k) e (0,1)}, 



L(x) = x(i, k) \og(x(i, k)). 



(14) 



lei First Order Derivatives : 



dL 



dx(i, k) 



1 + log(x(i, k)), 1 < i, k < m — 1. 



163 Second Order Derivatives 



d 2 L 



dx 2 (i,k) x(i,k) 



1 < i, k < m — 1, 



<9x(j, l)dx(i, k) 



0, 1 < i, k, I < m — 1, 



167 respectively. 



(15) 



(16) 



Later on we shall obtain the gradient vector and the hessian matrix 

V L (A a .(o)) ,H L {Aa*(o)) 
of the function L at a point 

A a *(o)=(...,E(X a *(i,k)),..). 
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3.2 Functions Related with GDL- Estimator 



169 Herein, we shall consider the deterministic multivariate set of functions, as 

170 in (fl"3l). 



G[(n , of) (i,fc)] ( fe) , ^(i) > u (*0 



x(i, k) — h(i)v(k) 
h{i)v{k) 



1 < i, k < m. 



172 which, after fixing (i, k), we temporarily rename it as follows: 



G = G[{n,a1) (i,k)], G : -D G C (0, l) 3 — > E, 



£> G = {xG (0,1) 3 : aj= (x,M)}, 



(x — fa;)' 
hv 



(17) 



First Order Derivatives : 



dG 



— -2, 
<9x fa <9/i 



<9G -x 2 



t>fa 



dv 



—x 



hv 2 ~^ 



177 Second Order Derivatives : 



d 2 G 

dx 2 



2 

fa' 



d 2 G _ -x 2 d 2 G _ -x 2 

dh 2 h 3 v ' dv 2 v 3 h ' 



d 2 G 



-2x 



d 2 G 



-2x 



dhdx h 2 v ' dv dx hv 2 



d 2 G 

dvdh 



x 



h 2 v 2 



(19) 



Likewise, as in the previous subsection we get the gradient vector and the 
hessian matrix Vg (r a «(o)) , Hg (r a «(o)) of the function G(x,h,v) at a 
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point 

r af (o) = (E(X a «(i,k)),E(W a «(i)),E(V af (k))). 



iso 4 Multivariate Variances 

181 Focusing on G^ajHi.fc)] for fixed (i,k), n the order of the Markov chain 

182 and G[( n ,a K ) (i,fc)] as in ffT3]) . let us recall the empirical random variables, 

183 introduced in Definition 12.11 



X a «(i, k) 



N(i a^k\Xl) 



i=l 



184 Observe that the Markov chain we are interested in, has order k and it is 

185 clear that 



X a «(i, k) independent random variables, 1 < i, k < 



m, 



G[{n,af)(i,j)], l<i,k<m—l, 



(20) 



188 are independent, with 



189 G[(n,af)(ij)] (X af (z,fc),E[ a « («),¥„«(/;)) 



(X af (z,A;)-e af WV af (A;))' 



H af (z)V af (fc) 

wo as well as for adequatly sample size n the random variables 



£(H af (*))~-, E(V a «(k)) ^mE(X a «(s,k)), \<s<m. 



For the sake of notation's simplicity, we temporarily rename 

G[(n, a -)(i,j)] (x(i,k),h(i),v(k)) 

192 as G(x, h, v) where its derivatives, as well as the variances, covariances and 

193 related information of { X a «(z, k), HI «(z), V a «(/c) } shall be as follows: 
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\m a .(i)Y aKi (k)J ~ £(H ar (i))£(V af (£;)) ~ i(m£(X a «(a))) ~ ' 



a,P€:{x,h,v } 



da 



df3 



- H£)> + * + *) + t((£H£)") + 

2x / / x \ / x \ ^ \ / / x \ ^ / x \ ^ / x \ 



198 
199 



. 



and 



COVf 



[ af (i,fe),H of (i)) ~ COV(x of (i,fc),v of (fc)) ^ COV( Ho5f (i),v of (fc)) ^ °X af (i,fc)- 



2oi Likewise, 



dG d 2 G 

-(x, h, v) OQO (x, h, v) 



a,P,f£{x,h,v} 



da 



cov (x af (i,fc),x2 f (i,fc)) ~ cov (Xof(i)fc))] ^ f(i)) ~ cov (Xaf(i)fc)iV 2 fW) « o- Xaf ( i)fe ) 



COV/ 



(X af (i,fe),H2 f (j)) ^ COV (Xar(i)fc)jV 2 f(fc)) ~ (7 Xof(iife) 



205 Finally, 




<9v 2 <9t><9x 



\dvdx J 



ar 



4 + 



+ 



4 



X' 



d 2 G 

dx 2 



+ 



d 2 Gd 2 G d 2 Gd 2 G 
+ 



<9x 2 (9t> 2 9a; 2 <9t> dx 



+ 



(i- 



/i 2 w 2 
2 



x 4a; 
4 + — + — 



8 X 



\ 1 1 
-)+ — + — 



x^ 
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K (i,k) ,X 2 aK (i,k)) 
1 a l 



COV (X 2 f(ijfe))H 2 ?(i)) 



COV/ir2 



J "l 1 



COV/ H 2 



«(i).I$»0) 
l a i 



COV/M2 



«(i),V2 K (fc)) 
1 a l 



COVma 



^(fc),v^ f (*)) - ^(t,*)- 



2ii Let us denote by B G i? 3 , the unit ball centered at the point 



with 



u= (x af (2,A;),e af (z),V af (A;)), A = u - T af (< 



212 Taylor ([5]) showed that there exist < c s , q < 1 such that 

213 G[(n,af)(iJ)](w) = 

G (™,af)[^j]( r af(°)) + V G (n>af) M(-T a? (o)J .(W - /^(o)) + 

+^ (w - r of ( )) .^ G(n , af)M (r a? ( ) + Cg z\) . (a; - r af (o)) 

where the variance of 

Gf [(n,of)(t l *)](X a ?(i,fc), Haf(i)> V af 0) 



216 IS 



[(n,af)«.j)] 



G (n,a?)[«, - G( n>a «)[z,j] (T a «(< 

Vo (BlB?J M (n f (o)) • 4 + ^ • W^.^M (/^ (o) + c 9 (w - r a? (o))) . 4* 
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221 where u = (x a *(i, k), H a «(i), Y a *(k) j G B and 



222 @''[x(i,k),h(i),v(k) 



h 2 (t)v 2 (k) 
4 



' x\i,k) x 3 (i,k) _ x(i,k) ' 
v\k) v 3 (k) v(k) _ 



x(i, k) 

224 with variance for I < i, k < m — 1 



(7, 



G[(n,a?)(i,fc)] 



r a? {o) + c g (u - r a «( )) 



2 ' 



226 and, by ( 120]) . the total variance 



m— 1 to— 1 

EE 

i=i fc=i 



(n,af) (i,fc)]' 



228 Exactly as before we can obtain the total variance of L( n ,af) (X a j(i, fc)J and 

229 defining 



230 (7 



[(n, nf) (i,fc)] 



E 



L[(n,of) ^aj^j ~ ^[(n,af) (i,fe)] (^(X !f(z, &)) 

V L[( „, af)(iifc)] .(x af (i,A ; )- J E;(X a? (z,A;))) + 
- ^ «L [(B , af , (^(Xa f ) + Q [x(i, k) - E(X a «)] ) . (X af (i, fc) - £(X a « (i, fc)) 



[(n, a J )(i,fc)] 



[l + ln(a:(< 1 A ; ))] a oi (il , (4>fc) +<Pf («(*,*)) 



o - x of (i,fc)» l<i,k<m 
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235 where lo = (x(i,k),h(i),v(k)j G B, 



236 (x(i, k)) 

237 with variance for 1 < i, k < m — 1 



x 2 (i, k) 



[1 + ln(x{i,k))} 2 a| aK(i)fc) + 



a 



[(n,af )(»,*)] 



X af (i,fc) 



E(X af ) + Q [x(z,fc)- J E(X af 



2 ' 



239 and, by f )20p . the total variance 




242 
243 
244 
245 
246 
247 
248 
249 
250 
251 
252 
253 



5 Conclusion 

The purpose of this work was the comparative analysis of the non asymptotic 
behavior for the estimators AIC§7§, BIC§£§, EDCQ, versus the estimator 
defined in Definition 12 .21 and named as Global Depency Level-GDL, for details 
see ([6]). 

The GDL uses a function different to the log likelihoog function applied to 
the sample, which makes the estimator perform in a quite different form. It is 
strongly consistent and more efficient than j4/C(inconsistent), outperform- 
ing the well established and consistent BIG and EDC, mainly on reasonable 
small samples. 

The estimators just mentioned are based on the composition of the empirical 
random variables with two diferent deterministic functions. The log likeli- 
hood approach, as in (II ID. with 



L[{n,af)(i,k)](x{i,k)j = d(c 2 - n(i)x(i,k) \ogx(i,k)\ 



255 or, the GDL approach, as in (fl2j) . with 
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G 



/ x - [E^WM)] [J2?=i x ( s J)]) 



257 Since the sample only depends on the Markov chain X™ and its size n, once 

258 the sample is chosen, the entirely responsibles for the estimator's variance 

259 are the following random variables: 



L 



[(n,af)(i,fc)] 



£i (c 2 - X af (i, k) log X af (i, fc) 



and 



G 



m m 



[(n,of)(i,fc)] - Z^Z^ 
fc=l i=l 



(E^lXa r (i,t))(Er=lXaf( S ,fc)) 



263 with variances for 1 < i, fc < m — 1 



(J, 



L [(n,af)(i,fc)] 



[1 + ln(x(i,A;))] 2 cr| aK( . fc) + 



E(X a .) + Cl [x(i,k)-E(X aKi )] 



2 ' 



and 





4 a i r (^) 


a G[(n, of) (*,*)] - 


_r of (o) + c 9 (w-r af (o))] 2 



26? respectively. 



268 Finally the reader should notice that the log likelihood based estimators are 

269 heavily affected by \og(x(i, k))) which in cases where the Markov chain 

270 intrisically presents empirical random variables X a *{i,k) with small expec- 

271 tations, the fluctuating values of x(i, k) converging to E(X a *(i, k)) ~ im- 

272 poses the coefficients [1 + log(x(i, A;))] 2 and its variance a\ ( ^ a great deal 

273 of instability or variance. 



J (n,af) 
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274 The following Appendix presents a few examples exhibiting such anomaly. 



275 6 Appendix 

276 6.1 Numerical Evidence 

277 In what follows we shall compare the non-asymptotic performance, mainly 

278 for small samples, of some of the most used Markov chains order estimators. 

279 It is quite intuitive that the random information regarding the order of a 

280 Markov chain, is spread over an exponentially growing set of empirical dis- 

281 tributions with |0| = m B+1 , where B is the maximum integer t], as in 

282 a = (ii«2---^)- It seems reasonable to think that a small viable sample, 

283 i.e. samples able to retrieve enough information to estimate the chain order, 

284 should have size n ~ 0(m B+1 ). Keeping in mind that for the present nu- 

285 merical simulation, the maximum length to be used is B — 5, from now on 

286 the sample sizes for \E\ = 3 and \E\ = 4 should be n w 1.500 and n « 5.000, 

287 respectively. 

288 The following numerical simulation, based on an algorithm due to Raftery [23J , 

289 starts on with the generation of a Markov chain transition matrix, Q = 

290 (qi 1 i 2 ...i K ;i n+1 ) with entries 

K 

291 Qhi2...i K ;i K+1 = X it R(i K+ i,i t ), 1 < hJk+i < rn. (21) 

i=l 

292 where the matrix 

rn 

293 R(i, j), < i, j < m, ^2R(i,j) = 1, 1 < j < m 

294 and the positive numbers 

K 

295 {^i}i=l, 22^1 = 1 

i=l 

296 are arbitrarily chosen in advance. 

297 Once the matrix Q = {<ii 1 i 2 ...i K ;i K+1 ) is obtained, two hundreds replications of 

298 the Markov chain sample of size n, space state E and transition matrix Q 
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299 are generated to compare GDL{rj) performance against the standards, well 

300 known and already established order estimators just mentioned above. 

301 Finally, after applying all estimators to each one of the replicated samples, 

302 the final results two hundreds replications are registered in the form of tables. 



303 Case I: Markov Chain Examples with k — 0, \E\ — 3. 

304 Firstly, we choose the matrix {Q±, Q2,Qs} to produce samples with sizes 

305 500 < n < 2.000, originated from Markov chains of order k = with quite 

306 different probability distributions. 



Qi 



0.33 0.335 0.335 
0.33 0.335 0.335 
0.33 0.335 0.335 



,Q 2 



0.05 0.475 0.475 
0.05 0.475 0.475 
0.05 0.475 0.475 



0.05 0.05 0.90 
0.05 0.05 0.90 
0.05 0.05 0.90 



\E\ 



K 



A, = 1/3, i = l,2,3. 





Qi 


Qi 


Qi 




n = 500 


n= 1.000 


n = 1.500 


k 


Aic 


Bic 


Edc 


Gdl 


Aic 


Bic 


Edc 


Gdl 


Aic 


Bic 


Edc 


Gdl 





75.5% 


100% 


100% 


99% 


80% 


100% 


100% 


99.5% 


71.5% 


100% 


100% 


99% 


1 


24.5% 






1% 


18% 






0.5% 


22.5% 






1% 


2 










2% 








6% 








3 


























4 



























\E\ 



A, = 1/3, i = l,2,3. 





Q 2 


Q2 


Q2 




n = 1.000 


n = 1.500 


n = 500 


k 


Aic 


Bic 


Edc 


Gdl 


Aic 


Bic 


Edc 


Gdl 


Aic 


Bic 


Edc 


Gdl 





63.5% 


100% 


100% 


99% 


63% 


100% 


100% 


99% 


59% 


100% 


100% 


99% 


1 


29% 






1% 


34.5% 






1% 


37% 






1% 


2 


7.5% 








2.5% 








4% 








3 


























4 
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\E\ = 3 




> K = 






= 1/3, i = 


= 1,2,3. 








Q 3 


Q 3 




Q 


3 






n = 1.000 


ra = 1.500 


n = 2.000 


k 


Aic 


Bic 


Edc 


Gdl 




Bic 


Edc 


Gdl 




Bic 


Edc 


Gdl 





43% 


100% 


100% 


98% 


47% 


100% 


99.5% 


96% 


46% 


100% 


100% 


97% 


1 


53% 






2% 


51.5% 




0.5% 


4% 


50.5% 






2% 


2 


4% 








1.5% 








3.5% 






1% 


3 


























4 



























3n Notice that for a fixed sample size n = {500, 1.000, 1.500, 2.000}, the order es- 

312 timator kaic steadily overestimate the real order n = with the excessiveness 

313 depending on the probability distribution of the Markov chain. Differently, 

314 the order estimators k.bic, k-edc and kgdl show consistent performance, 

315 mainly obtaining the right order, free from the influence of the sample size 

316 and the generating matrix. Regarding k-bic an d ^edc improved effect, most 
31? likely depends on their correcting factor, log 2 ^ and ^ 1 2(\e\ -i) ) wn i cn tend to 

318 decrease the estimated order. 

319 For \E\ = 4 the greater complexity of a Markov chain of order k = 3 impose 

320 the use of larger sample size for estimators to acomplish some reliability. 

321 Finally, we choose the matrix {Qq, Qj} to produce samples with size n = 

322 5.000, originated from Markov chains of order k G {2, 3, 0} like in the previous 

323 cases. 



Qt 



0.05 0.05 0.05 0.85 

0.05 0.05 0.85 0.05 

0.05 0.85 0.05 0.05 

0.85 0.05 0.05 0.05 



0.05 0.05 0.05 0.85 

0.05 0.05 0.05 0.85 

0.05 0.05 0.05 0.85 

0.05 0.05 0.05 0.85 
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\E 


= 4 o n = 5.000 




Qq <^ A, = 1/2, i = l,2. 


Q 6 A 4 = l/3, i = l,2,3. 


Q7 A; = 1/3, i= 1,2,3. 




K = 2 


K = 3 


K = 


k 


Aic 


Bic 


Edc 


Gdl 


ylic 


Bic 


Edc 


Gdl 


Aic 


Bic 


Edc 


Gdl 





















85% 


100% 


100% 


100% 


1 


















15% 








2 


100% 


100% 


100% 


100% 




99% 




4% 










3 










100% 


1% 


100% 


96% 










4 


























5 


























6 



























326 For the order for \E\ =4, k = 0, apparently kaic keeps overestimating the 

327 order in some degree, while k.bic as m example k = 3 severely underestimate 

328 the order, presumably due to the excessive weight of the correcting factors 

329 i^lM. On the contrary kedc an d ^gdl behaves quite well in same setting. 
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